EXPLORER: Supporting Run-Time Parallelization of DO-ACROSS Loops on General Networks of Workstations
نویسندگان
چکیده
Performing runtime parallelization on general networks of workstations (NOWs) without special hardware or system software supports is very diicult, especially for DOACROSS loops. With the high communication overhead on NOWs, there is hardly any performance gain for runtime parallelization, due to the latter's large amount of messages for dependence detection, data accesses, and computation scheduling. In this paper, we introduce the EXPLORER system for runtime paralleliza-tion of DOACROSS and DOALL loops on general NOWs. EXPLORER hides the communication overhead on NOWs through multithreading | a facility supported in almost all workstations. A preliminary version of EXPLORER was implemented on a NOW consisting of eight DEC Alpha workstations connected through an Ethernet. The Pthread package was used to support multithreading. Experiments on synthetic loops showed speedups of up to 6.5 in DOACROSS loops and 7 in DOALL Loops.
منابع مشابه
Effects of Parallelism Degree on Run-Time Parallelization of Loops
Due to the overhead for exploiting and managing parallelism, run-time loop parallelization techniques with the aim of maximizing parallelism may not necessarily lead to the best performance. In this paper, we present two parallelization techniques that exploit different degrees of parallelism for loops with dynamic crossiteration dependences. The DOALL approach exploits iterationlevel paralleli...
متن کاملLocal predecimation with range index communication parallelization strategy for fractal image compression on a cluster of workstations
In this paper, we have implemented and evaluated the performance of local predecimation with range index communication parallelization strategy for fractal image compression on a beowulf cluster of workstations. The strategy effectively balances the load among workstations. We have evaluated the execution time of LPRI, varying the number of workstations and user-specified root mean square error...
متن کاملAffine Transformations for Communication Minimized Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences
A long running program often spends most of its time in nested loops. The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses for parallel execution. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that improve performance by parallelization as well as better locality. Although a significant am...
متن کاملThe LRPD Test: Speculative Run–Time Parallelization of Loops with Privatization and Reduction Parallelization
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it ha...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کامل